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In this article , we study a model for error-correcting codes that comes from spin 
glass theory and leads to both new codes and a new decoding technique . Using the 
theory of spin glasses f it has been proven that a simple construction yields a family 
of binary codes whose performance asymptotically approaches the Shannon bound 
for the Gaussian channel. The limit is approached as the number of information 
bits per codeword approaches infinity while the rate of the code approaches zero. 
Thus , the codes rapidly become impractical. We present simulation results that 
show the performance of a few manageable examples of these codes. 

In the correspondence that exists between spin glasses and error-correcting codes , 
the concept of a thermal average leads to a method of decoding that differs from 
the standard method of finding the most likely information sequence for a given 
received codeword. Whereas the standard method corresponds to calculating the 
thermal average at temperature zero , calculating the thermal average at a certain 
optimum temperature results instead in the sequence of most likely information 
bits. Since linear block codes and convolutional codes can be viewed as examples of 
spin glasses, this new decoding method can be used to decode these codes in a way 
that minimizes the bit error rate instead of the codeword error rate. We present 
simulation results that show a small improvement in bit error rate by using the 
thermal average technique. 


I. Introduction 

In a 1989 article in Nature [2], Nicolas Sourlas claimed that by using an Ising spin glass model he could 
construct a family of error-correcting codes whose performance asymptotically approached the Shannon 
coding bound. In 1993, Pal Rujan proposed an idea in Physical Review Letters [6] for decoding spin glass 
codes with a lower resulting bit error rate than could be obtained by finding the most likely codeword, 
and claimed that the method could also be used for convolutional codes. In this article, we study both 
of these claims. 

This article is organized as follows: In Section II, we introduce Ising spin glasses, which are just 
collections of particles with spin ±1. We also briefly discuss the associated concepts of energy, thermal 
equilibrium, ground states, magnetization, phase transitions, and gauge invariance. In Section III, we 
explain the connection between Ising spin glasses and binary error-correcting codes on an additive white 
Gaussian noise channel. As an example, we will show that spin glasses with a certain type of interactions 
are equivalent to convolutional codes. We show in Section IV how the physical properties of spin glasses 
can be used to prove that a family of error-correcting codes based on spin glasses has a bit error rate 
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approaching zero while the rate approaches the Shannon capacity. Although the construction is simple, 
it will be clear that the codes rapidly grow too large to be practical. In Section V, we present simulation 
results that show the performance of a few manageable examples of these codes. Here, decoding is done 
by finding the ground state of a spin glass, which corresponds to finding the most likely codeword. 

In Section VI, we consider a different decoding technique, based on the concept of a thermal average. 
We show that this method of decoding at an optimum, nonzero temperature minimizes bit error rate, 
as opposed to codeword error rate. We present simulation results that verify a small decrease in bit 
error rate for a few examples of spin glass codes, and also for the (8,4) Hamming code and the (24,12) 
Golay code, both of which can be viewed as spin glass codes. In Section VII, we demonstrate a method 
for decoding convolutional codes at nonzero temperatures. The method, based on the transfer matrix 
method of statistical mechanics, is similar to a Viterbi decoder, which it reduces to in the case of zero 
temperature. We present simulation results that show a small decrease in bit error rate for a convolutional 
code decoded at the optimum temperature. Finally, in Section VIII, we present our conclusions. 


II. Properties of Spin Glasses 

An Ising spin glass is a set of N particles with spin ±1 [1]. The energy of a spin glass depends on 
the values of the spins and the strengths of the interactions among the particles. For instance, if there 
is a positive interaction between two particles, then the energy contribution from that interaction will be 
lower if the spins are the same and higher if they are opposite. In general, the strength of the interaction 
between any p particles *i , • * • ,i p is given by the coupling coefficient Let s, be the spin of the 

z'th particle and S = {$!,*•• , a configuration of the N spins. The energy of the whole system when 
it has configuration S is given by the Hamiltonian 

H(S)=-'£ E (1) 

P {«!,■■■, ip) c{l, -,A r } 

where the connectivity matrix is 1 if the p particles interact and 0 otherwise. (Each 

subset is understood to appear only once in the sum.) We will assume that the coupling coefficients 
are given by independent identically distributed (i.i.d.) random Gaussian variables with mean J 0 and 
variance crj. 

The physical situation that we are considering here is that the interaction strengths are fixed while 
the spins are free to change. Let T be the temperature of the system. Then the probability of the system 
being in a particular configuration S at equilibrium is given by the Gibbs distribution [1]: 

-(l fkT)H(S) 

p(S ) = J (2) 

where k is Boltzmann’s constant and the partition function Z (a normalizing constant) is given by 

z=E e_(1/tT)H(S) ( 3 ) 

5 

As T — * 0, we see that p(S) — ► 0, except for 5 such that H(S) is a minimum. Such a minimizing 
configuration S is known as a ground state. Unfortunately, in many cases the ground state is degenerate, 
meaning that there may be multiple configurations of spins that have the same minimum energy. For 
instance, consider the case p — 2: For any configuration, the energy of the configuration obtained by 
reversing all the spins will be the same. 


27 



The magnetization m(S) of a configuration S is simply the average spin: m(S) = (l/N) this 

article, we will need to consider limits as the number of particles N — ► oo. In this situation, there can 
occur a phase transition, which is a sort of discontinuity in some global characteristic of the system as a 
function of some continuous parameter(s). Specifically, for the spin glass model we have described, there 
is a phase transition at zero temperature at a particular critical value of Jo/ 07 : For Jo /07 below this 
cutoff, the magnetization m (of the ground state, since T = 0) is zero, whereas above it, m > 0. We 
will dismiss the problem of degenerate ground states by merely stating that there are ways to make the 
ground state unique. The existence of the phase transition also requires that the connectivity matrix not 
be too sparse. This condition will be satisfied in the cases for which we will invoke a phase transition. 

We now define one more property of our spin glass model, called gauge invariance. Let {ei, • ■ • ,e;v} be 
an arbitrary configuration of Ising spins, that is, €{ = ±1. A system is gauge invariant if the configuration 
space is invariant under the transformation Si — ► and if the Hamiltonian is invariant under this 

transformation and the simultaneous transformation Ji l} } i p ‘ € i p - Clearly, our model is 

gauge invariant, since the result of multiplying any sequence of Ising spins by another arbitrary sequence 
of Ising spins is yet another sequence of Ising spins, and Eq. (1) is unchanged if both transformations are 
applied simultaneously. 


III. Using Spin Glass Models as Error-Correcting Codes 

In a 1989 article in Nature [2], Nicolas Sourlas suggested using an Ising spin glass model to construct 
error-correcting codes. In this section, we describe his proposal. For simplicity, he only considered the 
slightly simpler special case where interactions are restricted to a single value of p. Thus, we have the 
Hamiltonian 


H(S) = - X) 






’ * Si, 


( 4 ) 


Let {ai , • * * , a/vr}, a, = ±1, be an N - bit information sequence. Let Ji lt t i p ~ n a *i * • ■ a i p whenever 
— 1. Then the set of spins corresponding to the data will be the ground state of the Hamiltonian 
with these coupling coefficients. Thus, our spin glass model yields a code, with codewords given by the 
computed set of coupling coefficients and decoding done by finding the ground state of the Hamiltonian 
specified by the coupling coefficients. Before finding the rate, we make one more simplification: Assume 
that the coordination number Zj — Ylj? ... j ^\; 3t jp = z is independent of i. (Notice that Cj t j 3t ... j p is 
invariant under permutations of its indices, so that z* is the number of interacting subsets that include 
$i.) Then the code rate is given by R — pjz . 

Now we consider the issue of noise. Let us assume that the transmitted codeword symbols have mag- 
nitude ±v and duration r, and are subject to additive white Gaussian noise (AWGN) of spectral density 
No. Then the noise is included in the model via the already discussed variance cr 2 of the coupling coeffi- 
cients, with corresponding channel SNR — E s /No = v 2 t/No = J 2 /2(Tj. For Jo << 07 , or equivalently, 
v 2 r « N 0i the channel thus has capacity C = ( 1 / In 2 ) Jq / 2 crJ bits per coupling coefficient [3]. 

We are ultimately concerned with the probability of decoded bit error so we now consider the 
corresponding quantity in the spin glass model. This is where gauge invariance comes in: It allows us to 
assume, without loss of generality, that the spins are all +1. Then, P& is just the probability that a bit is 
decoded as —1. Since the ground state magnetization m is given by m = (+1 )Pr{s{ = +1} + (— l)Pr{s,- = 
— 1 } = (+1)(1 — Pb ) + (— 1 )P&, we have P b = (1 — m)/2. In the next section, we will describe a situation 
where m — ► 1 , and hence Pb — ► 0. 

As a familiar example, consider a binary (n, 1) convolutional code [3]. For simplicity, assume that each 
of its n generating polynomials {^(x), • * ■ ,^ n (^)} has exactly p terms. It can be described by our model 
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by letting N = oo, and C{ 1+ k,---,i p -\-k = 1 if and only if gj(x ) = a:* 1 H h x tp for some j. For instance, if 

n~p — 2, with pi(z) — 1 + x and g^{x) = 1 + x 2 , then the corresponding log likelihood formula, 


logPoc^ J,v- 


iSiSi-i + J l>t _25,s,_2 + constant 


( 5 ) 


that is used in Viterbi decoding is equivalent to the Hamiltonian we have defined. The code can be 
visualized as a one-dimensional infinite spin glass with short-range, translation invariant interactions. 


IV. Approaching the Shannon Bound With a Spin Glass Model 

We now consider a special case, known as Derrida’s random energy model [4], that is soluble. This 
means that the behavior of certain measurable quantities, such as the magnetization as a function of 
Jo/<rj and temperature T, can be calculated. Again, we restrict interactions to a single value of p, so 
that our Hamiltonian is given by Eq. (4). We assume that the connectivity is extensive, i.e., * ~ 
Because we will be letting N — + oo, we will use the scaled variables jo and oj , with Jq = (pl/N p ~ l )jo and 
<Tj = (pl/N p ~ 1 )aj ^ in order for the relevant quantities to remain finite. We then normalize by setting 
<Tj = 1, and so the channel SNR = (l/2)jg p\/N p ~ l . As N — ► oo, the asymptotic rate R = p\/N p ~ l and 
the capacity C = (jo/2 In 2)p\/N p ~ l bits per coupling coefficient. 

In Derrida’s random energy model, the number of particles p — ► oo and p/ N — > 0. He showed that 
in this case there is a phase transition for T < l/(2\/ln2) at jo = ^2^2 from a spin glass phase 
with m = 0 to a ferromagnetic phase with m = 1. Thus, for j 0 > y/2 In 2, the probability of error 
P e can be made arbitrarily small. Therefore, we can code at a rate arbitrarily close to capacity with 
P e — ► 0. Although R 0, we note that R f ) the rate in bits per sec, is given by R l — R/t. Since 
v 2 t/Nq = Jo/2cr} = (1/2)JqR = In 2 i?, it follows that R l = (1/ In 2)v 2 /No. This means that the rate 
in bits per sec remains a constant as a function of transmitter power P = v 2 and noise power TVq as 
P e — ► 0. The capacity C f in terms of bits per sec is C f = ( 1/ In 2)v 2 /Nq [3]. The bound is approached for 
codeword lengths approximately equal to N p , where both N and p approach infinity, so clearly the codes 
rapidly become impractical. However, we do have an explicit construction for a family of codes whose 
performance approaches the Shannon limit, and it is possible to simulate them, as described in the next 
section. Spin glass theory does give us some hint as to what we can expect: It can be shown that for 
large p, m & I - (2 ~ p /y/p) for codes satisfying the capacity constraint. 


V. Decoding by Finding the Ground State 

In this section, we discuss simulations of decoding by finding the ground state for a few examples of 
the family of codes described in the previous section. The size of the codes grows so quickly with N and p 
that only fairly small values could be used. Two different methods for finding the ground state were used: 
exhaustive search and simulated annealing. By using efficient recursive algorithms for exhaustive search 
and letting simulations run for over a week in some cases, it was possible to test codes with parameters 
as high as N = 20 and p = 5. Results are shown in Fig. 1, where the curves are labeled as (N,p). 
For the four codes with N — 8 and 12, each point represents 2500 codeword error events. For the two 
N = 16 codes and the (A r ,p) = (20,3) code, each point represents 100 codeword error events, and for the 
(20,5) code, only 25 codeword errors were obtained. The reason for using odd values of p was to avoid 
the degenerate ground states that result when p is even, and no other problems from degenerate ground 
states appeared. 
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Fig. 1. Bit error rates of spin glass codes with varying parameters (N,p). 


In an effort to speed up decoding, simulated annealing [5] was tried. In a physical spin glass, this 
corresponds to heating it up to a relatively high temperature and then slowly reducing the temperature 
asymptotically to 0. If the dwell time, or time spent at each temperature, is large enough for the 
temperature-decrementing factor used, then the system should reach equilibrium at each temperature, 
and thus end up in the ground state for T sufficiently close to zero. Results for N = 12 and p = 
3 are shown in Fig. 2, for two different dwell times. Each point on the^ simulated annealing curves 
represents approximately 1000 codeword errors, and the exhaustive search curve is taken from Fig. 1. 
Unfortunately, simulated annealing took even longer than the exhaustive search. This was true even for 
larger parameters, such as N = 20 and p = 5, because much longer dwell times were necessary to get 
reasonable performance. It is possible that further customization of the simulated annealing algorithm 
for the particular characteristics of this problem could result in significant improvement, but the potential 
benefits did not seem to justify the additional effort at this time. 


VI. Decoding at Nonzero Temperatures 

After reading Sourlas’ article, Pal Rujan proposed an idea for decoding spin glass codes with a lower 
resulting bit error rate than could be obtained by finding the ground state [6]. He showed that the effect 
of the channel was equivalent to heating up a spin glass to a particular temperature TV (the Nishimori 
temperature) [7]. For the model we have described with AWGN, TV = <Tj/kJo. This suggested decoding 
by computing the thermal average at TV of the Hamiltonian given by the received codeword. The thermal 
average is the average over all spin configurations, weighted by the Gibbs distribution, so that the decoded 


30 




value of the ith spin, s*, is given by (1 — mj)/2, where m, is the averaged magnetization of the ith particle, 
given by 


_ Y,s s i e ~ pNH(s) 


( 6 ) 


with = 1 / (ArTyv) = Jo/crj . Notice that finding the ground state is equivalent to computing the thermal 
average at T =0. 



5/ N 0 ,dB 


Fig. 2. Performance of simulated annealing versus exhaustive search for decoding a (12,3) spin 

glass code. 


Rujan presented results of a simulation that showed a small decrease in bit error rate at T = T /v 
compared to T = 0 [6]. Hidetoshi Nishimori responded by proving that the bit error rate was indeed 
lower at T — T Jy than at T — 0 [9]. In fact, he showed that this is true not only for AWGN, but for any 
noise with distribution f(\J\)e aJ . 

Rujan J s results seemed suspicious at first glance, because Viterbi decoding is known to be optimum. 
The difference is that Viterbi decoding finds the most likely sequence of input bits, but not necessarily the 
sequence of most likely input bits. Although the formula above came from considerations of statistical 
mechanics, it is easy to derive using just Bayes formula. We start by expressing m* as 


m, = Y^SiPr{S\{Ji lt . (7) 

5 

(Again, we are assuming for simplicity of notation a single value of p.) Using Bayes formula, we get 
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( 8 ) 




MViu _ J i £ )\S}Pr{S] 

ZsPr{{Jiu~S,}\S}Pr{S) 


Pr{S } = 1/2 n for all S, and 


Pr {{Jii,-,i,}\S} = JJ e -( J >i,-,i r ~Jo> n >. r ) 3 /2a 3 j 






= Ae (1/2<, 5 ) E {il ,..,i,}c " s 'p 


(9) 


= Ae~ PNH( - s) 


where A is independent of 5 (using s? = 1). Finally, substituting Eq. (9) into Eq. (8) and then Eq. (8) 
into Eq. (7) yields Eq. (6), the desired result. 


In fact, we can now show that Rujan’s method actually minimizes the bit error rate. From Eq. (7), 


™i = X) SiPr{S\{J iu ,.„}}+ ^2 SiPr{S\{J ilt . 

5 : 5 , = + 1 5 : 5 , = — 1 

- (+1 )Pr{ Si = + 1|{J, + (-1 )Pr{ Si = 


(10) 


so mi > 0 if and only if Pr{si = +l|{J l - li ... |l - p }} > Pr{5 a - = — 1|{ 

Figure 3 shows the results of decoding simulations for a few of the spin glass codes described in 
Section IV. Each point represents 10,000 codeword error events. The bit error rate is lower for T — Tjv 
than for T = 0, although the difference is small. Figure 4 shows similar results for the (24,12) Golay 
code and the (8,4) Hamming code. Again, each point represents 10,000 codeword error events. In these 
simulations, calculations were done by directly computing the sum in Eq. (6), using efficient recursive 
algorithms. Of course, for anything but the smallest codes, using Eq. (6) directly is impractical. As 
mentioned previously, however, for convolutional codes one can use Rujan’s transfer matrix method, 
which will be discussed in the next section. 


Although we have shown that Eq. (6) can be derived without reference to spin glasses, the theory 
behind spin glasses might still be useful in dealing with error-correcting codes. One example, of course, is 
the use of an algorithm from statistical mechanics for decoding. The simulation results do not demonstrate 
a great improvement in performance, but theoretically, it would be interesting to have bounds on how 
much improvement is possible from minimizing the bit error rate instead of the codeword error rate. 
Perhaps spin glass theory could shed some light on this, Another idea that seems reasonable for decoding 
at the Nishimori temperature is to use simulated annealing, but only decrease the temperature to T n 
instead of all the way to zero. 
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It also appears that the theory of Markov random fields [1], which is closely related to spin glass theory, 
might have implications for finite length codes and infinite codes (in one or more dimensions) with only 
short-range interactions. Codes with short-range interactions are likely to be more practical, since the 
rate would not decrease so rapidly as the number of information bits increases. Two-dimensional codes 
might be useful for encoding images, for example. 

Another interesting suggestion was pointed out by Sourlas [10]: It is easy to modify Eq. (6) in order 
to minimize the probability of error of particular blocks of input data. For instance, if the binary input 
data consisted of a sequence of k - bit symbols, one could minimize the symbol error rate instead of the 
bit error rate or instead of the sequence error rate. 


VII. Decoding Convolutional Codes at Nonzero Temperatures 

Rujan [6] showed that for a one-dimensional spin glass with short-range interactions, such as a con- 
volutional code, a variant of the transfer matrix method of statistical mechanics [8] could be used to 
compute the thermal average. It finds the m^s for a given temperature T using a recursive algorithm 
that reduces to the Viterbi algorithm at T = 0, and has similar complexity. We illustrate Rujan’s algo- 
rithm with a simple rate 1/2 convolutional code with generator matrix G(x) — [1 + x 2 1 + x -f x 2 }. Let 
{A"i(l), A" 2 ( 1 ) > /ii(2), * • ■ , I\ 2 (N)} be the 2N received coupling coefficients, R'i(i) = a,-_iaja t -+i fn/i) 
and K 2 ( 2 ) = cii-iCii+\ + 712 ( 3 ), where ni(i) and 712 (i) are i.i.d. Gaussian variable of mean zero and variance 
<Tj. (Define a 0 = ajv + 1 = 0.) Then the energy can be written as 


TV— 1 


H(S)= , s ,-, s<+1 ) 

i = 2 


(ii) 


where #i( s i-i , s i > s t'+i) — A"i(0 5 *-i 5 * 5 *+i + ^(O^i-i^'+i* The value for m g is given by 




( 12 ) 


where the s are defined recursively: Let 4 , i^( s i> s 2) = 1 and compute 




(13) 


for % — 2, 3, • • * , N, Si = ±1. The inverse temperature is 0 = 1 /(&T), so for T = 7V , 0 = Jo/<r 2 - Then, 
let ^'(sat-i, stv) = 1 and compute 




0 = E^+i( s »- s i+i) e ~ /,iy,(, ” ,,, " 5,+l) 


(14) 


for i = N — I, -- ,2,1, S{ = ±1. Finally, the decoded sequence is obtained by setting d t = sgn (m,), 
i = 1,2,---, AT, where the m0 s are computed from Eq. (12). 

This algorithm can be visualized on a Viterbi decoder trellis, each column having four states, la- 
belled (+1, +1), (+1,— 1), (-1, -f 1), and (-1,-1). The branch metric from (sj_i,s,*) to (s t , s t+ i) is 
and ip>(s i} Si+ 1 ) is the path metric at (si,s,+i). The difference from the Viterbi 
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algorithm is that instead of the path metric being solely determined by the best incoming path, it is 
a weighted sum of the path metrics of the incoming paths. After the s are computed from left to 
right, traceback is done by simultaneously computing ipf and a* from right to left. Unlike the Viterbi 
algorithm, however, instead of selecting a single best path, the algorithm computes a weighted average 
of all possible paths. If T = TV, then this weighted average, when quantized to ±1, yields the sequence 
of most likely bits, whereas the Viterbi decoder yields the most likely sequence of bits. 

As T decreases, we see that the best (lowest energy) incoming path is weighted increasingly more 
heavily relative to the other incoming path, with the ratio of the two weights approaching infinity as 
T — ► 0. Thus, the algorithm reduces to the Viterbi algorithm at T = 0, although renormalization is 
necessary to avoid all weights being infinite. The algorithm can also be modified, as the Viterbi algorithm 
usually is, to allow traceback to begin before reaching the end of the received symbols, at the cost of a 
slight loss of performance. 

Figure 5 shows the results of decoding the convolutional code described above at T — 0.17V, T — 7V 
and T = 27V- Each point represents approximately 10,000 bit errors. The bit error rate is lowest for 
T “ TV, although the difference is minuscule. The case T = 0 was not simulated, because the reduction 
to the Viterbi algorithm is not automatic, since simply using the value T — 0 results in dividing by zero. 
However, the performance at T = 0 should be only microscopically worse than at T — 0.17V- 



E s /N 0 ,dB 


Fig. 5. Bit error rates for a rate 1/2 convolutional code decoded at three different temperatures. 


VIII. Conclusion 

We simulated the performance of some simple examples of a family of codes whose performance has 
been shown to asymptotically approach the Shannon bound. The codes have parameters N and p, with 
codeword length (^) and rate N/{^). The largest code we were able to simulate had a codeword length 
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of 15,504, a rate of 0.00129, and took over a week to accumulate only 25 codeword errors. Even for this 
code, the performance was far from capacity. Thus, these codes, although theoretically interesting, are 
probably not particularly useful, unless radical new decoding methods are developed. One possible such 
method might be to use an actual spin glass, with coupling coefficients specified by a received codeword, 
and let it come to equilibrium; then measure the spins to obtain the decoded data. However, this would 
be, at the very least, technically extremely challenging, and might even be physically impossible. 

We also did simulations to measure the improvement in bit error rate achievable by using a decoding 
method analogous to computing the thermal average of a spin glass. The performance at the optimum 
temperature, which minimizes bit error rate, was compared to the performance of a standard decoder 
that minimizes codeword error rate. We tested some examples of the codes described in Section IV 
and also a (24,12) Golay code, an (8,4) Hamming code, and a simple rate 1/2 convolutional code. For 
the convolutional code, we used a decoding algorithm based on an algorithm from statistical mechanics 
for computing thermal averages in one-dimensional systems. For all the codes, there was a measurable 
but very small improvement in bit error rate. The improvement did not justify the increased decoding 
complexity, even for the convolutional code, where the decoding algorithm had complexity of the same 
order as the Viterbi algorithm. 
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